Determine if a unit is providing marginal power


In [1]:
%matplotlib inline
from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import gzip
import cPickle as pickle


/Users/Home/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

Method

For every hour, find the change in net load from the previous hour. Net load is defined as total load minus wind generation.

Change in net load

Use ERCOT data on hourly load and hourly wind generation to create Net Load and Net Change columns.


In [2]:
filename = 'ERCOT wind data.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
ercot = pd.read_csv(fullpath, index_col=0)

In [3]:
ercot.head()


Out[3]:
ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change
2007-01-01 00:00:00 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN
2007-01-01 01:00:00 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834
2007-01-01 02:00:00 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777
2007-01-01 03:00:00 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659
2007-01-01 04:00:00 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680

In [4]:
ercot.loc[:,'Net Load (MW)'] = ercot.loc[:,'ERCOT Load, MW'] - ercot.loc[:,'Total Wind Output, MW']

In [5]:
ercot.head()


Out[5]:
ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW)
2007-01-01 00:00:00 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0
2007-01-01 01:00:00 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834 29210.4
2007-01-01 02:00:00 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777 29091.8
2007-01-01 03:00:00 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659 28892.7
2007-01-01 04:00:00 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680 29410.9

In [6]:
ercot.loc[1:,'Net Load Change (MW)'] = ercot.iloc[1:,-1].values - ercot.iloc[:-1,-1].values

In [7]:
ercot.loc[:,'DATETIME'] = pd.to_datetime(ercot.index)

In [8]:
ercot.head()


Out[8]:
ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW) Net Load Change (MW) DATETIME
2007-01-01 00:00:00 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 2007-01-01 00:00:00
2007-01-01 01:00:00 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834 29210.4 -143.6 2007-01-01 01:00:00
2007-01-01 02:00:00 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777 29091.8 -118.6 2007-01-01 02:00:00
2007-01-01 03:00:00 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659 28892.7 -199.1 2007-01-01 03:00:00
2007-01-01 04:00:00 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680 29410.9 518.2 2007-01-01 04:00:00

EPA data

Calculate the change in generation for each plant in every hour. By summing the change across all plants in a group (cluster), we will have the change in generation for that group from the previous hour to the current hour.


In [9]:
# load a zipped pickle file 
# from http://stackoverflow.com/questions/18474791/decreasing-the-size-of-cpickle-objects
def load_zipped_pickle(filename):
    with gzip.open(filename, 'rb') as f:
        loaded_object = pickle.load(f)
        return loaded_object

In [10]:
filename = 'EPA hourly dictionary.pgz'
path = '../Clean Data'
fullpath = os.path.join(path, filename)

epaDict = load_zipped_pickle(fullpath)

In [11]:
epaDict.keys()


Out[11]:
['2012 Jan-June',
 '2007 July-Dec',
 '2013 Jan-June',
 '2008 July-Dec',
 '2011 Jan-June',
 '2010 Jan-June',
 '2010 July-Dec',
 '2009 July-Dec',
 '2015 Jan-June',
 '2015 July-Dec',
 '2012 July-Dec',
 '2007 Jan-June',
 '2008 Jan-June',
 '2011 July-Dec',
 '2014 Jan-June',
 '2014 July-Dec',
 '2009 Jan-June',
 '2013 July-Dec']

In [12]:
def plant_gen_delta(df):
    """
    For every plant in the input df, calculate the change in gross load (MW)
    from the previous hour.
    
    input:
        df: dataframe of EPA clean air markets data
    return:
        df: concatanated list of dataframes
    """
    df_list = []
    for plant in df['PLANT_ID'].unique():
        temp = df.loc[df['PLANT_ID'] == plant,:]
        gen_change = temp.loc[:,'GROSS LOAD (MW)'].values - temp.loc[:,'GROSS LOAD (MW)'].shift(1).values
        temp.loc[:,'Gen Change (MW)'] = gen_change
        df_list.append(temp)
    return pd.concat(df_list)

In [13]:
allEPA = pd.concat(epaDict)

In [14]:
allEPA.fillna(0, inplace=True)
allEPA.head()


Out[14]:
CO2 (SHORT TONS) COUNTY DATE DATETIME EPA REGION FACILITY LATITUDE FACILITY LONGITUDE FACILITY NAME GROSS LOAD (MW) HEAT INPUT (MMBTU) HOUR NERC REGION NOX (POUNDS) PLANT_ID SO2 (POUNDS) STATE STEAM LOAD (1000LB/HR) YEAR
2007 Jan-June 0 0.0 Harris 2007-01-01 2007-01-01 00:00:00 6 29.7239 -95.2258 AES Western Power, LLC 0.0 0.0 0 ERCOT 0.0 3461 0.0 TX 0.0 2007
1 0.0 Harris 2007-01-01 2007-01-01 01:00:00 6 29.7239 -95.2258 AES Western Power, LLC 0.0 0.0 1 ERCOT 0.0 3461 0.0 TX 0.0 2007
2 0.0 Harris 2007-01-01 2007-01-01 02:00:00 6 29.7239 -95.2258 AES Western Power, LLC 0.0 0.0 2 ERCOT 0.0 3461 0.0 TX 0.0 2007
3 0.0 Harris 2007-01-01 2007-01-01 03:00:00 6 29.7239 -95.2258 AES Western Power, LLC 0.0 0.0 3 ERCOT 0.0 3461 0.0 TX 0.0 2007
4 0.0 Harris 2007-01-01 2007-01-01 04:00:00 6 29.7239 -95.2258 AES Western Power, LLC 0.0 0.0 4 ERCOT 0.0 3461 0.0 TX 0.0 2007

In [15]:
allEPA = plant_gen_delta(allEPA)


/Users/Home/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:296: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/Users/Home/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:476: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

In [16]:
allEPA.reset_index(drop=True, inplace=True)

In [17]:
allEPA.tail()


Out[17]:
CO2 (SHORT TONS) COUNTY DATE DATETIME EPA REGION FACILITY LATITUDE FACILITY LONGITUDE FACILITY NAME GROSS LOAD (MW) HEAT INPUT (MMBTU) HOUR NERC REGION NOX (POUNDS) PLANT_ID SO2 (POUNDS) STATE STEAM LOAD (1000LB/HR) YEAR Gen Change (MW)
4511875 395.801 Brazoria 2015-12-31 2015-12-31 19:00:00 6 29.0728 -95.745 Sweeny Cogeneration Facility 422.0 6660.4 19 ERCOT 174.4 55015 3.358 TX 0.0 2015 -1.0
4511876 397.101 Brazoria 2015-12-31 2015-12-31 20:00:00 6 29.0728 -95.745 Sweeny Cogeneration Facility 423.0 6681.5 20 ERCOT 153.7 55015 3.378 TX 0.0 2015 1.0
4511877 395.601 Brazoria 2015-12-31 2015-12-31 21:00:00 6 29.0728 -95.745 Sweeny Cogeneration Facility 420.0 6655.3 21 ERCOT 162.9 55015 3.364 TX 0.0 2015 -3.0
4511878 401.501 Brazoria 2015-12-31 2015-12-31 22:00:00 6 29.0728 -95.745 Sweeny Cogeneration Facility 420.0 6756.3 22 ERCOT 175.4 55015 3.383 TX 0.0 2015 0.0
4511879 403.300 Brazoria 2015-12-31 2015-12-31 23:00:00 6 29.0728 -95.745 Sweeny Cogeneration Facility 422.0 6787.3 23 ERCOT 156.4 55015 3.402 TX 0.0 2015 2.0

Generation change by group/cluster

Clusters are determined using k-means.


In [18]:
cluster_df = pd.read_csv('Cluster labels.csv')

In [19]:
merged_epa_cluster = pd.merge(allEPA, cluster_df, left_on=['PLANT_ID', 'YEAR'], 
                             right_on=['plant_id', 'year'])

In [20]:
merged_epa_cluster.head()


Out[20]:
CO2 (SHORT TONS) COUNTY DATE DATETIME EPA REGION FACILITY LATITUDE FACILITY LONGITUDE FACILITY NAME GROSS LOAD (MW) HEAT INPUT (MMBTU) ... PLANT_ID SO2 (POUNDS) STATE STEAM LOAD (1000LB/HR) YEAR Gen Change (MW) Unnamed: 0 year plant_id cluster_id_6
0 0.0 Nueces 2007-01-01 2007-01-01 00:00:00 6 27.6067 -97.3119 Barney M. Davis 0.0 0.0 ... 4939 0.0 TX 0.0 2007 NaN 57 2007 4939 4
1 0.0 Nueces 2007-01-01 2007-01-01 01:00:00 6 27.6067 -97.3119 Barney M. Davis 0.0 0.0 ... 4939 0.0 TX 0.0 2007 0.0 57 2007 4939 4
2 0.0 Nueces 2007-01-01 2007-01-01 02:00:00 6 27.6067 -97.3119 Barney M. Davis 0.0 0.0 ... 4939 0.0 TX 0.0 2007 0.0 57 2007 4939 4
3 0.0 Nueces 2007-01-01 2007-01-01 03:00:00 6 27.6067 -97.3119 Barney M. Davis 0.0 0.0 ... 4939 0.0 TX 0.0 2007 0.0 57 2007 4939 4
4 0.0 Nueces 2007-01-01 2007-01-01 04:00:00 6 27.6067 -97.3119 Barney M. Davis 0.0 0.0 ... 4939 0.0 TX 0.0 2007 0.0 57 2007 4939 4

5 rows × 23 columns


In [21]:
grouped_clusters = merged_epa_cluster.loc[:,['Gen Change (MW)', 'GROSS LOAD (MW)', 'DATETIME', 'cluster_id_6']].groupby(['DATETIME', 'cluster_id_6']).sum()

In [22]:
grouped_clusters.reset_index(inplace=True)

In [23]:
grouped_clusters.dtypes


Out[23]:
DATETIME           datetime64[ns]
cluster_id_6                int64
Gen Change (MW)           float64
GROSS LOAD (MW)           float64
dtype: object

In [24]:
grouped_clusters_ercot = pd.merge(grouped_clusters, ercot, on='DATETIME')

In [25]:
grouped_clusters_ercot.loc[:,'year'] = grouped_clusters_ercot.loc[:,'DATETIME'].apply(lambda x: x.year)
grouped_clusters_ercot.head()


Out[25]:
DATETIME cluster_id_6 Gen Change (MW) GROSS LOAD (MW) ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW) Net Load Change (MW) year
0 2007-01-01 0 NaN 4596.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 2007
1 2007-01-01 1 NaN 4614.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 2007
2 2007-01-01 2 NaN 124.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 2007
3 2007-01-01 3 NaN 126.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 2007
4 2007-01-01 4 NaN 537.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 2007

In [100]:
g = sns.FacetGrid(grouped_clusters_ercot, col='year', col_wrap=3, hue='cluster_id_6')
g.map(plt.scatter, 'Net Load Change (MW)', 'Gen Change (MW)', facecolors='None')
g.set_xticklabels(rotation=30)
g.add_legend()


Out[100]:
<seaborn.axisgrid.FacetGrid at 0x15060b350>
Out[100]:
<seaborn.axisgrid.FacetGrid at 0x15060b350>
Out[100]:
<seaborn.axisgrid.FacetGrid at 0x15060b350>

Generally the figure below shows that each cluster tends to behave somewhat linearly (change in generation over the hour vs change in net load), but there are a couple exceptions.

  • Cluster 4 looks like it might have 2 distinct groups in it. See if this is still true after we go back and check all calculations.
  • Same for Cluster 3, but not as strong of an effect.

In [109]:
sns.lmplot('Net Load Change (MW)', 'Gen Change (MW)', data=filtered_data,
           col='year', row='cluster_id_6', hue='cluster_id_6', robust=True, ci=None)


Out[109]:
<seaborn.axisgrid.FacetGrid at 0x1b9594350>

The figure below shows how/if ramping changes as wind generation increases from 2007 to 2015. Cluster 3 is especially interesting, because it contains high-efficiency NG plants with average capacity just under 1 GW. Over time it starts ramping up (and down) by larger amounts . Some other clusters show similar behavior, while other (like 0, 1, 2) don't change nearly as much.


In [102]:
filtered_data = grouped_clusters_ercot.loc[grouped_clusters_ercot['year'].isin([2007, 2011, 2015])]

g = sns.FacetGrid(filtered_data, col='year', row='cluster_id_6')
g.map(plt.scatter, 'Total Wind Output, MW', 'Gen Change (MW)', facecolors='None')
g.set_xticklabels(rotation=30)
g.add_legend()


Out[102]:
<seaborn.axisgrid.FacetGrid at 0x156082b50>
Out[102]:
<seaborn.axisgrid.FacetGrid at 0x156082b50>
Out[102]:
<seaborn.axisgrid.FacetGrid at 0x156082b50>

Getting Total Capacity per Cluster


In [26]:
#Load EIA 860 from picle
eia860fname = "EIA 860.pkl"
eia860Full = os.path.join(path, eia860fname)
eia860Dict = pickle.load(open(eia860Full, "rb"))

#Add year to as a column
for k in eia860Dict.keys():
    eia860Dict[k]["Year"] = k

#Flatten dictionary, rename columns, and do inner join 
merged860 = pd.concat(eia860Dict)
merged860.columns = ["plant_id", "nameplate_capacity", "year"]
merged860 = cluster_df.merge(merged860, on=["plant_id", "year"])

groupCapacity = merged860.loc[:,["cluster_id_6", "year", "nameplate_capacity"]].groupby(by=["cluster_id_6", "year"]).sum()

In [27]:
groupCapacity.head()


Out[27]:
nameplate_capacity
cluster_id_6 year
0 2007 5949.0
2008 3969.0
2009 4008.4
2010 5988.4
2011 4008.4

In [29]:
groupCapacity.reset_index(inplace=True)

In [30]:
groupCapacity.head()


Out[30]:
cluster_id_6 year nameplate_capacity
0 0 2007 5949.0
1 0 2008 3969.0
2 0 2009 4008.4
3 0 2010 5988.4
4 0 2011 4008.4

Merge with grouped_clusters_ercot


In [31]:
# Yes, I realize this is a horrible name
grouped_clusters_ercot_capacity = pd.merge(groupCapacity, grouped_clusters_ercot, on=['cluster_id_6', 'year'])

In [32]:
grouped_clusters_ercot_capacity.head()


Out[32]:
cluster_id_6 year nameplate_capacity DATETIME Gen Change (MW) GROSS LOAD (MW) ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW) Net Load Change (MW)
0 0 2007 5949.0 2007-01-01 00:00:00 NaN 4596.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN
1 0 2007 5949.0 2007-01-01 01:00:00 -30.0 4566.0 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834 29210.4 -143.6
2 0 2007 5949.0 2007-01-01 02:00:00 101.0 4667.0 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777 29091.8 -118.6
3 0 2007 5949.0 2007-01-01 03:00:00 1.0 4668.0 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659 28892.7 -199.1
4 0 2007 5949.0 2007-01-01 04:00:00 17.0 4685.0 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680 29410.9 518.2

In [33]:
grouped_clusters_ercot_capacity.describe()


Out[33]:
cluster_id_6 year nameplate_capacity Gen Change (MW) GROSS LOAD (MW) ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW) Net Load Change (MW)
count 473334.000000 473334.000000 473334.000000 473328.000000 473334.000000 473334.000000 473334.000000 473334.000000 473334.000000 473334.000000 473280.000000 420720.000000 473334.000000 473328.000000
mean 2.500000 2010.999442 7196.330644 -0.012402 2636.120820 37013.204232 9312.042054 2945.984653 30.975796 8.510844 0.036198 3.771452 34067.219580 0.018802
std 1.707827 2.581818 4058.496386 240.003813 2370.098557 9016.268746 3096.526857 2244.330501 19.905077 6.951802 442.976558 290.847217 9478.982475 1914.872580
min 0.000000 2007.000000 2602.100000 -2877.000000 0.000000 19645.310547 2790.000000 -8.318962 -0.162575 -0.027421 -3243.023247 -35200.000000 14995.058576 -32967.731770
25% 1.000000 2009.000000 4008.400000 -57.000000 564.250000 30804.125000 8181.000000 1156.001275 13.862967 3.034220 -210.400451 -9.952249 27500.339356 -1218.689494
50% 2.500000 2011.000000 5256.500000 0.000000 2158.000000 34826.570312 9430.400000 2469.099226 28.940298 6.757363 -5.400000 -0.361188 32092.907262 -39.400000
75% 4.000000 2013.000000 10014.800000 58.000000 4052.000000 41746.316406 11064.400000 4276.593214 46.008671 12.334329 198.052032 10.844909 39093.334216 1210.354252
max 5.000000 2015.000000 16228.900000 3243.000000 11607.000000 69820.960940 16170.000000 13812.025390 92.010060 44.303584 4044.954171 56900.000000 68484.264158 32284.047237

Add fuel price data


In [34]:
filename = 'Fuel prices.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
fuel_prices = pd.read_csv(fullpath, index_col=0)

In [35]:
fuel_prices.head()


Out[35]:
NG Price ($/mcf) Month Year All coal Lignite Subbituminous
0 6.42 1 2007 25.1475 20.0275 28.115
1 7.34 2 2007 25.1475 20.0275 28.115
2 6.90 3 2007 25.1475 20.0275 28.115
3 7.29 4 2007 25.1475 20.0275 28.115
4 7.51 5 2007 25.1475 20.0275 28.115

In [36]:
grouped_clusters_ercot_capacity.loc[:,'Month'] = grouped_clusters_ercot_capacity.loc[:,'DATETIME'].apply(lambda x: x.month)

# Make 'year' capitalization the same
grouped_clusters_ercot_capacity = grouped_clusters_ercot_capacity.rename(columns = {'year':'Year'})

In [51]:
X = pd.merge(grouped_clusters_ercot_capacity, fuel_prices, on=['Year', 'Month'])

In [52]:
X.head()


Out[52]:
cluster_id_6 Year nameplate_capacity DATETIME Gen Change (MW) GROSS LOAD (MW) ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW) Net Load Change (MW) Month NG Price ($/mcf) All coal Lignite Subbituminous
0 0 2007 5949.0 2007-01-01 00:00:00 NaN 4596.0 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN 29354.0 NaN 1 6.42 25.1475 20.0275 28.115
1 0 2007 5949.0 2007-01-01 01:00:00 -30.0 4566.0 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834 29210.4 -143.6 1 6.42 25.1475 20.0275 28.115
2 0 2007 5949.0 2007-01-01 02:00:00 101.0 4667.0 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777 29091.8 -118.6 1 6.42 25.1475 20.0275 28.115
3 0 2007 5949.0 2007-01-01 03:00:00 1.0 4668.0 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659 28892.7 -199.1 1 6.42 25.1475 20.0275 28.115
4 0 2007 5949.0 2007-01-01 04:00:00 17.0 4685.0 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680 29410.9 518.2 1 6.42 25.1475 20.0275 28.115

In [53]:
X.tail()


Out[53]:
cluster_id_6 Year nameplate_capacity DATETIME Gen Change (MW) GROSS LOAD (MW) ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Net Load (MW) Net Load Change (MW) Month NG Price ($/mcf) All coal Lignite Subbituminous
473329 5 2015 11476.0 2015-12-31 19:00:00 -20.0 7516.0 39908.77734 16170.0 3824.932373 23.654498 9.584188 483.971191 14.485987 36083.844967 -1172.357911 12 2.23 28.53 24.04 32.55
473330 5 2015 11476.0 2015-12-31 20:00:00 -964.0 6552.0 38736.85938 16170.0 4625.632813 28.606264 11.941166 800.700440 20.933715 34111.226567 -1972.618400 12 2.23 28.53 24.04 32.55
473331 5 2015 11476.0 2015-12-31 21:00:00 -608.0 5944.0 37587.70313 16170.0 4957.714844 30.659956 13.189725 332.082031 7.179170 32629.988286 -1481.238281 12 2.23 28.53 24.04 32.55
473332 5 2015 11476.0 2015-12-31 22:00:00 -246.0 5698.0 36356.26172 16170.0 4699.097656 29.060592 12.925140 -258.617188 -5.216460 31657.164064 -972.824222 12 2.23 28.53 24.04 32.55
473333 5 2015 11476.0 2015-12-31 23:00:00 -333.0 5365.0 35150.33984 16170.0 4313.125000 26.673624 12.270507 -385.972656 -8.213761 30837.214840 -819.949224 12 2.23 28.53 24.04 32.55

In [54]:
y = X.loc[:,['DATETIME', 'cluster_id_6', 'Gen Change (MW)']]

In [55]:
y.head()


Out[55]:
DATETIME cluster_id_6 Gen Change (MW)
0 2007-01-01 00:00:00 0 NaN
1 2007-01-01 01:00:00 0 -30.0
2 2007-01-01 02:00:00 0 101.0
3 2007-01-01 03:00:00 0 1.0
4 2007-01-01 04:00:00 0 17.0

In [56]:
y.describe()


Out[56]:
cluster_id_6 Gen Change (MW)
count 473334.000000 473328.000000
mean 2.500000 -0.012402
std 1.707827 240.003813
min 0.000000 -2877.000000
25% 1.000000 -57.000000
50% 2.500000 0.000000
75% 4.000000 58.000000
max 5.000000 3243.000000

In [57]:
X.drop('Gen Change (MW)', axis=1, inplace=True)

Export final dataframes


In [58]:
X_fn = 'X.csv'
y_fn = 'y.csv'
X_path = os.path.join(path, X_fn)
y_path = os.path.join(path, y_fn)

X.to_csv(X_path)
y.to_csv(y_path)

Other data we still need in this DF!!!

Still need the total capacity of each cluster at that hour (determined by year).

Anything else?

X (for every cluster and every hour):

  • Total installed wind
  • Total wind generation
  • Change in net demand (demand change + wind change) from previous hour
  • Total capacity of the cluster
  • Current generation of the cluster
  • Natural gas price (monthly)
  • Coal price (quarterly, data is not in format I hoped for)
    • We either need to use a price for "All coal" (both SUB and LIG), or figure out the % heat input for each cluster. I'm fine either leaving out coal price to start or just using "All coal".

If we have time and need to improve performance, maybe add change in cluster generation over earlier timesteps (e.g. from two hours ago to one hour ago).

y (for every cluster and every hour)

  • Change in cluster generation from previous hour

Export EPA with change in generation

Don't think we need this anymore


In [ ]:
filename = 'EPA with hourly gen change.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
allEPA.to_csv(fullpath)

In early years the generation change is almost independent of the demand change. This slowly shifts over time, so that the generation is correlated with

Plot the change in generation for plant ids 298 & 3439


In [98]:
plants = allEPA.loc[allEPA['PLANT_ID'].isin([298, 3439]),:]

In [99]:
plants.tail()


Out[99]:
CO2 (SHORT TONS) COUNTY DATE DATETIME EPA REGION FACILITY LATITUDE FACILITY LONGITUDE FACILITY NAME GROSS LOAD (MW) HEAT INPUT (MMBTU) HOUR NERC REGION NOX (POUNDS) PLANT_ID SO2 (POUNDS) STATE STEAM LOAD (1000LB/HR) YEAR Gen Change (MW)
1617163 1504.3 Limestone 2015-12-31 2015-12-31 19:00:00 6 31.4219 -96.2525 Limestone 1552.0 13816.9 19 ERCOT 2418.4 298 5165.4 TX 0.0 2015 -39.0
1617164 1080.6 Limestone 2015-12-31 2015-12-31 20:00:00 6 31.4219 -96.2525 Limestone 1103.0 9925.9 20 ERCOT 1746.0 298 3614.5 TX 0.0 2015 -449.0
1617165 904.5 Limestone 2015-12-31 2015-12-31 21:00:00 6 31.4219 -96.2525 Limestone 903.0 8308.0 21 ERCOT 1690.4 298 2987.5 TX 0.0 2015 -200.0
1617166 913.6 Limestone 2015-12-31 2015-12-31 22:00:00 6 31.4219 -96.2525 Limestone 902.0 8391.6 22 ERCOT 1644.1 298 3046.2 TX 0.0 2015 -1.0
1617167 916.4 Limestone 2015-12-31 2015-12-31 23:00:00 6 31.4219 -96.2525 Limestone 901.0 8417.2 23 ERCOT 1575.6 298 3145.4 TX 0.0 2015 -1.0

In [32]:
plants.index = plants['DATETIME']

In [38]:
test = pd.merge(ercot, plants, left_index=True, right_index=True)

In early years the generation change is almost independent of the demand change. This slowly shifts over time, so that the generation is correlated with


In [40]:
g = sns.FacetGrid(test[test['PLANT_ID']==298], col='YEAR', col_wrap=3)
g.map(sns.regplot, 'Net Change', 'Gen Change')
g.set_xticklabels(rotation=30)
plt.suptitle('Coal plant 298', y=1.02, size=15)


Out[40]:
<seaborn.axisgrid.FacetGrid at 0x12589df50>
Out[40]:
<seaborn.axisgrid.FacetGrid at 0x12589df50>
Out[40]:
<matplotlib.text.Text at 0x14bb1fb10>

In [35]:
g = sns.FacetGrid(test[test['PLANT_ID']==3439], col='YEAR', col_wrap=3)
g.map(sns.regplot, 'Net Change', 'Gen Change')
g.set_xticklabels(rotation=30)
plt.suptitle('Natural gas plant 3439', y=1.02, size=15)


Out[35]:
<seaborn.axisgrid.FacetGrid at 0x11a562890>
Out[35]:
<seaborn.axisgrid.FacetGrid at 0x11a562890>
Out[35]:
<matplotlib.text.Text at 0x119c0bd90>

In [ ]: